Lydia: A System for Large-Scale News Analysis

نویسندگان

  • Levon Lloyd
  • Dimitrios Kechagias
  • Steven Skiena
چکیده

Periodical publications represent a rich and recurrent source of knowledge on both current and historical events. The Lydia project seeks to build a relational model of people, places, and things through natural language processing of news sources and the statistical analysis of entity frequencies and co-locations. Lydia is still at a relatively early stage of development, but it is already producing interesting analysis of significant volumes of text. Indeed, we encourage the reader to visit our website (http://www.textmap.com) to see our analysis of recent news obtained from over 500 daily online news sources. Perhaps the most familiar news analysis system is Google News [1], which continually monitors 4,500 news sources. Applying state-of-the-art techniques in topic detection and tracking, they cluster articles by event, group these clusters into groups of articles about related events, and categorize each event into predetermined top-level categories, finally selecting a single representative article for each cluster. A notable academic project along these lines is Columbia University’s Newsblaster [2,4,8], which goes further in providing computer-generated summaries of the day’s news from the articles in a given cluster. Our analysis is quite different from this. We track the temporal and spatial distribution of the entities in the news: who is being talked about, by whom, when, and where? Section 2 more clearly describes the nature of the news analysis we provide, and presents some global analysis of articles by source and type to demonstrate the power of Lydia. Lydia is designed for high-speed analysis of online text. We seek to analyze thousands of curated text feeds daily. Lydia is capable of retrieving a daily newspaper like The New York Times and then analyzing the resulting stream of text in under one minute of computer time. We are capable of processing the entire 12 million abstracts of Medline/Pubmed in roughly two weeks on a single computer, covering virtually every paper of biological or medical interest published since the 1960’s. A block diagram of the Lydia processing pipeline appears in Figure 1. The major phases of our analysis are:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Thesis Proposal: Interpreting News through the Science of Networks

On-line news sources provide a large and comprehensive corpus of world events and news entities. The Lydia project (www.textmap.com) analyzes over a thousand on-line newspapers every day to discover news trends, sentiments, and geographic biases. The aim of the project is to deliver news analysis on a scale of content that would be impossible for a person to read, and to mine the data to discov...

متن کامل

Large-Scale Sentiment Analysis for News and Blogs (system demonstration)

News can be good or bad, but it is seldom neutral. Although full comprehension of natural language text remains well beyond the power of machines, the statistical analysis of relatively simple sentiment cues can provide a surprisingly meaningful sense of how the latest news impacts important entities. Here we demonstrate our large-scale sentiment analysis system for news and blog entities built...

متن کامل

Financial Analysis Using News Data

This paper surveys the field of financial analysis using news data, and also our own considerable work regarding movie gross analysis to show the predictive power of news data. Since the 1990s, linguistic sources such as news have been continuously proved to carry extra and meaningful information beyond traditional quantitative finance data, and thus they can be used as predictive indicators in...

متن کامل

News and Blog Analysis with Lydia

The Lydia project seeks to build a relational model of people, places, and things through natural language processing of news sources and the statistical analysis of entity frequencies and co-locations. Our analysis is quite different from Google News. We track the temporal and spatial distribution of the entities in the news: who is being talked about, by whom, when, and where? Please visit ou...

متن کامل

Integrating Semantic Video Understanding and Knowledge Visualization for Large-Scale News Video Exploration

In this paper, we have developed a novel framework to enable more effective visual analysis and exploration of large-scale news videos via knowledge visualization. A novel interestingness measurement for video news reports is proposed to enable analysts and general audiences to find news stories of interest at first glance and catch the valuable knowledge in large-scale video news databases. Ke...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005